Using a novel clumpiness measure to unite data with metadata: Finding common sequence patterns in immune receptor germline V genes

نویسندگان

  • Gregory W. Schwartz
  • Ali Shokoufandeh
  • Santiago Ontañón
  • Uri Hershberg
چکیده

When finding relationships in biological systems, we often describe hierarchies based on one facet of the data. However, when using this hierarchy to elucidate relationships between metadata, the distribution of metadata labels within the hierarchy may exhibit different levels of aggregation—uniform, random, or clumped. As of now, there exists no measure for finding the level of aggregation, or “clumpiness”, between labels distributed among the leaves of a hierarchical container. We propose a clumpiness measure to aid in the quantification of relationships between metadata. We validated our measure with random trees and found that the measure is resistant to changes in the tree size, label size, and the number of types of labels, compared to the closest alternative measures. We used our clumpiness measure to quantify the relationships between light and heavy chains in human and mouse B cell and T cell receptor V genes based on their motifs. We found that the B cell heavy chains were the most aggregated while the T cell chains were the least aggregated and that the IGL chain was clumped the most with the T cell chains out of all of the B cell chains. © 2016 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IMPre: An Accurate and Efficient Software for Prediction of T- and B-Cell Receptor Germline Genes and Alleles from Rearranged Repertoire Data

Large-scale study of the properties of T-cell receptor (TCR) and B-cell receptor (BCR) repertoires through next-generation sequencing is providing excellent insights into the understanding of adaptive immune responses. Variable(Diversity)Joining [V(D)J] germline genes and alleles must be characterized in detail to facilitate repertoire analyses. However, most species do not have well-characteri...

متن کامل

A Novel Method for Detection of Epilepsy in Short and Noisy EEG Signals Using Ordinal Pattern Analysis

Introduction: In this paper, a novel complexity measure is proposed to detect dynamical changes in nonlinear systems using ordinal pattern analysis of time series data taken from the system. Epilepsy is considered as a dynamical change in nonlinear and complex brain system. The ability of the proposed measure for characterizing the normal and epileptic EEG signals when the signal is short or is...

متن کامل

سه موتاسیون ژرم لاین جدید در ژن MLH1 در بیماران مبتلا به سرطان کولورکتال ارثی

Abstract Background: Hereditary non-polyposis colorectal cancer is the most common cause of early onset of hereditary colorectal cancer. In the majority of Hereditary non-polyposis colorectal cancer families, microsatellite instability and germline mutation in one of the DNA mismatch repair genes in clouding MSH2, MLH1, MSH6 and PMS2 are found. The Objective of this study was to determine th...

متن کامل

Lamarck and Immunity : Somatic and Germline Evolution of Antibody Genes

Current work on the mechanism of hypermutation of somatically rearranged antibody variable (V) genes shows that the most likely mechanism involves both direct DNA modification (deamination of cytosines to uracils by AID deaminase) and strand nicking plus mRNA editing (deamination of adenosine to inosine via the ADAR1 deaminase) coupled to a reverse transcription process to fix RNA sequence modi...

متن کامل

Lamarck and Immunity: Somatic and Germline Evolution of Antibody Genes

Current work on the mechanism of hypermutation of somatically rearranged antibody variable (V) genes shows that the most likely mechanism involves both direct DNA modification (deamination of cytosines to uracils by AID deaminase) and strand nicking plus mRNA editing (deamination of adenosine to inosine via the ADAR1 deaminase) coupled to a reverse transcription process to fix RNA sequence modi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 74  شماره 

صفحات  -

تاریخ انتشار 2016